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An implicit purification sclieme is proposed for cal- 
culation of the temperature-dependent, grand canon- 
ical single-particle density matrix, given as a Fermi- 
Dirac operator expansion in terms of the Hamilto- 
nian. The computational complexity is shown to scale 
with the logarithm of the polynomial order of the ex- 
pansion, or equivalently, with the logarithm of the 
inverse temperature. The system of linear equations 
that arise in each implicit purification iteration is 
solved efficiently by a conjugate gradient solver. The 
scheme is particularly useful in connection with linear 
scaling electronic structure theory based on sparse 
matrix algebra. The efficiency of the implicit tem- 
perature expansion technique is analyzed and com- 
pared to some explicit purification methods for the 
zero temperature density matrix. 

Linear scaling electronic structure theory in combina- 
tion with tight-binding, self-consistent Hartree-Fock or 
density functional theory has become a very powerful 
tool for studying complex large material systems [1,2]. 
There are several ways to achieve a computational cost 
that scales linearly with system size. Here we focus on 
methods based on the single-particle density matrix for 
band-gap materials, the matrix elements of which decay 
exponentially with overlap distance. For large systems 
the number of matrix elements above some numerical 
threshold therefore scales as 0{N). In these schemes the 
two major steps are the construction of the tight-binding, 
Fockian or Kohn-Sham Hamiltonian H{r, r') and the cal- 
culation of the density matrix p{r, r'). The present article 
concerns aspects of the second problem, the construction 
of the density matrix. 

The relation between the density matrix at T = and 
the Hamiltonian is given by the Heaviside step function 
[3] 

p^9{fiI-H). (1) 

In density matrix schemes this relation is approximated 
by constructing p from H using sparse matrix algebra, 
where each major operation computationally scales lin- 
early with the system size, thanks to 0{N) matrix spar- 
sity. This can be achieved through constrained min- 
imization schemes [4,5], spectral projections or purifi- 
cation methods [6~12], or by an expansion of the tem- 
perature dependent Fermi-Dirac function or similar step 
function approximations [13-18]. Contour integral repre- 
sentations of the Fermi distribution with complex Fade 



polynomials as resolvents, that can be calculated 0{N) 
implicitly [19], as well as combinations of various ap- 
proaches have also been explored [20-25]. 

The quadratically convergent purification techniques 
have turned out to be some of the most efficient ap- 
proaches for the construction of the density matrix, 
both in memory and speed [8,25,11,12], with a com- 
putational complexity, in terms of number of matrix- 
matrix multiplications necessary to reach convergence, 
that scales linearly with the logarithm of the inverse 
band gap and the degree of expansion, and with a 
numerical error that scales linearly with the thresh- 
old [11,12]. The Fermi-Dirac operator expansion-based 
methods based on Chebychev expansion techniques are 
generally much slower, with the computational cost scal- 
ing at best with the square root of the degree of expan- 
sion [26,18]. However, these methods have an impor- 
tant advantage; they can account for a finite tempera- 
ture distribution of the density matrix. Here we propose 
an expansion scheme that combines the low logarithmic 
complexity and quadratic convergence of the purification 
schemes with the finite temperature Fermi-Dirac distri- 
bution. We show how this can be accomplished by an 
implicit purification scheme, based on a Fade approxima- 
tion of the rescaled Fermi-Dirac function, with a compu- 
tational complexity that scales logarithmically with the 
expansion order, or equivalently, the inverse temperature. 

The Fermi-Dirac distribution [3] , 

$FD(e,M,/3) = ;^(7Z^y^, (2) 

occurs in statistical mechanics as the occupational dis- 
tribution of fermions at finite temperatures. It converges 
to a step function with the step formed at the chemical 
potential p when T 0. The temperature dependent 
grand canonical density matrix is formally given by the 
operator relation 

piP) = 'S>MH,P,P). (3) 

The single-particle energy of a fermion system at a 
finite temperature is given by 

= Tr[Hpm - ^(0,|77|0,)(0,|p(/3)|</.,), (4) 

in some set of basis functions 4n. In this formulation the 
expression for p{(3) does not have to be calculated explic- 
itly; instead the Fermi-Dirac function can be expanded in 
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Chcbychcv polynomials. The Chebychev expansion tech- 
nique is one of the most efficient ways of approximating 
a function and the Chebychev functions T„(a;) obey a 
simple two-step recurrence formula, (Tq = 1, Ti = x), 



Tn+l{x) = 2xTn{x) - Tn-l{x). 



(5) 



The products ^fu{H, n, /3)\(f)i) can be calculated effi- 
ciently using the two-step recurrence formula using only 
matrix- vector multiplications [13-17]. The Chebychev 
expansion technique has many advantages: for example, 
the costly matrix-matrix multiplications arc avoided, and 
error accumulation is small. However, compared to linear 
scaling purification techniques it is fairly inefficient [25]. 
The problem is the slow linear increase in polynomial or- 
der as a function of iterations in the two-step recurrence 
formula. To improve the efficiency, Liang et al. [18] re- 
cently suggested an alternative approach, where the ex- 
pansion polynomials are not calculated by the two-step 
formula, but by a direct expansion. This can be achieved 
with a computational complexity that scales with the 
square root of the polynomial order of the expansion 
0{^/n). This limit is optimal for a general polynomial 
[26]. However, by choosing the expansion with a par- 
ticular set of Pade polynomials, we will show how the 
computational complexity can be reduced even further, 
scaling only linearly with the logarithm of the polynomial 
order C'(logn) or the inverse temperature 0{logP). 

There are at least 19 different ways to calculate matrix 
exponentials [27]. Here we use one particular technique 
based on a Pade approximation. Consider the exponen- 
tial function 



A Taylor expansion to first order gives 

„ /2n + x^ 

e = lim 

n^oo \ 2n — X 



(6) 



(7) 



This Pade approximation can be used in the Fermi-Dirac 
function and for the rcscalcd chemical potential and in- 
verse temperature [3], fi' — 1/2 and /?' — 4n, 



1 (1 - .x)" 
^FD(a;, -,4n) w — . 

^ '2 ^ X" (1 - x)" 



(8) 



At higher values of n the approximation becomes increas- 
ingly better. The choice fi' = 1/2 is made to center the 
step of the Fermi-Dirac function at a; = 1/2. In the inter- 
val [0, 1] the approximation is a continuously decreasing 
function with a maximum of 1 at a; = and a minimum 
of at a: = 1 . This is the interval in which the tempera- 
ture dependent density matrix has its eigenvalues and it 
is the interval where the Fermi-Dirac distribution is well 
approximated already at fairly high temperatures. It is 
therefore the interval around which we chose to perform 



the expansion. This choice requires an initial rescaling of 
the Hamiltonian spectra around the interval [0, 1]. Let 



Gn{x) 



+ (1 - a;)" 



then the Fermi-Dirac function for jjl = 1/2 and f3' 
in the interval [0, 1] is approximated by 



(9) 



4n 



<&FD(a;, -,4n) = 



„4n(a:-l/2) 



1 



1-G„(a;). (10) 



The polynomial order n of the Pade approximation is 
proportional to the inverse temperature since n = /3'/4- 
This means that the lower the temperature the better the 
approximation. In practice, however, the approximation 
at the normalized energy interval [0, 1] is already very 
good at orders as low as n « 5. An example given in 
Fig. 1, which shows the Pade approximation 1 — G^{x), 
is virtually identical to the corresponding Fermi-Dirac 
function with /?' = 20 and ii' = 1/2. The inset shows the 
error. With the interval [0, 1] equal to 1 Ry this example 
corresponds to a temperature of 7894 K. At lower tem- 
peratures the approximation becomes increasingly bet- 
ter. The Pade approximation in Eq. (10) is only one 
alternative, but, as will be shown below, it turns out to 
be particularly simple and efficient. 

A major advantage with the Pade approximation in 
Eq. (10) is how efficiently we can calculate high orders of 
G„. The computational complexity is very low thanks to 
the iterative relation 



Gk^i{x) = Gk{Gi{x)). 



(11) 



In an operator expansion this corresponds to purifica- 
tions, projecting the eigenvalues towards and 1. In 
contrast to a more general polynomial expansion such as 
the Chebychev expansion, which computationally scales 
at best with the square root of the polynomial order, 
0{y/ri) [18,26], or as 0{n) if the two-step recurrence for- 
mula is used, the iterative relation above makes it possi- 
ble to reach the same order of expansion in only O(logn) 
steps. The same low logarithmic complexity is found gen- 
erally in purification expansion schemes that are based 
on iterative spectral projections. The Pade approxima- 
tion of the Fermi-Dirac distribution can thus be used in a 
highly efficient purification scheme which calculates the 
finite temperature density matrix p{0) at a set of nor- 
malized inverse temperatures (3' = An. The purification 
algorithm can be described by 



Xi =F{H,^x) 

Xi+i = Gm{Xi), i = l,2,. 



(12) 



The expansion order n and thus the normalized inverse 
temperature (3' must be chosen so that the number of 
iterations log^(n), is an integer. The function 
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F{H, n) = a{H - III) + 0.5/ (13) 

is a normalization function that rcscalcs all the eigenval- 
ues of H to the interval [0, 1], with the chemical poten- 
tial /i shifted to /x' = 1/2. The chemical potential and 
spectral bound must thus be known in advance. The 
normalization factor 

a W ^ min [{fl - i/min)~\ (-ffmax - At)~^] , (14) 

rescales the spectra and sets the temperature scale. The 
implicit purification scheme converges to the zero tem- 
perature density matrix for any value of a > 0, but the 
convergence is faster and the approximation is more ac- 
curate at higher temperatures if a is chosen to normalize 
the spectra aroimd [0, 1]. The spectral bounds i/max and 
Hmin can be estimated by for example Lanczos' algorithm 
or Gersgorin circles. Generally we have that the temper- 
ature [3] T — l/{akB4n), where n is the accumulated 
expansion order in Eq. (12). 

Because of the rational form of Gm{Xi) the purification 
scheme is implicit. Assuming a finite orthogonal basis 
representation, a set of linear equations in Eq. (12) has 
to be solved in each step for i = 1, 2 . . . , log„(n): 

[xr + (/-x,)'"]x,+i = xr, (15) 

which is given from the second step in Eq. (12) and from 
the definition of G in Eq. (9) along with its nested it- 
erative expansion property given in Eq. (11). Here we 
find another major advantage with our particular choice 
of Pade approximation. The left side system matrix 
Ai = [X^ + (/ — Xi)"^] is symmetric and positive defi- 
nite for symmetric Xj's with their spectra belonging to 
[0,1]. In fact, with increasing i, the system matrix Ai 
converges to the identity matrix /. The implicit equa- 
tions are therefore very well suited for solutions with the 
linear conjugate gradient method [28], that in turn, can 
efficiently exploit the close approximation of Xi to the 
unknown columns of Xi^i, which becomes increasingly 
more accurate and efficient towards the last iterations. 
Another possibly efficient alternative is the application 
of the sparse approximate inverse (AINV) [29,20] that 
can be expected to work well for this particular prob- 
lem. However, this approach has not been explored in 
the present study. 

Alternative implicit purification schemes can also be 
derived from various sign matrix expansions [30,31]. Sign 
matrix expansions are equivalent to purification. The 
only difference is that spectral projections are performed 
in the interval [—1, 1] instead of [0, 1], as in the case of 
purification. 

To analyze the efficiency of the algorithm compared 
with explicit purification schemes, we have chosen an 
N X N model Hamiltonian with N random diagonal el- 
ements. The overlap elements decay exponentially as a 



function of site separation on a randomly distorted sim- 
ple cubic lattice. This model represents a Hamiltonian 
of an insulator that might occur, for example, with a 
Gaussian basis set in density functional theory or in var- 
ious tight-binding schemes. The convergence is mainly 
determined by the occupation and the band gap. The 
test Hamiltonian was therefore modified such that all 
N eigenvalues were uniformly distributed on [0,1]. In 
this case the band gap Ag = 1/N, independently of 
the fractional occupation /occ = N^/N . This simpli- 
fies the analysis and comparison of the different meth- 
ods, which otherwise are hard to perform for a less ide- 
alized set of material systems. After each iteration a 
numerical threshold r = 1.0 x 10^'' was applied and 
convergence was determined when the error in energy 
l-Bapprox - -Boxactl < 1-0 X 10"''' [3]. The Convergence 
criterion corresponds in practice to T = 0. A compari- 
son at T Ri is necessary since the explicit purification 
schemes used in the comparison only give the zero tem- 
perature density matrix. At room temperature the com- 
putational effort with the implicit purification scheme is 
only slightly reduced because of the rapid convergence. 
The computational complexity was measured in num- 
ber of matrix-matrix multiplications, where N conjugate 
gradient steps, i.e. N matrix-vector multiplications, are 
counted as one matrix-matrix multiplication. 

Figure 2 shows the computational cost for various oc- 
cupation factors. The implicit purification scheme of or- 
der two (IP), i.e. with to = 2 in Eq. (15), using Xi as 
initial approximations to Xj+i , is compared to the trace 
correcting scheme with second order polynomials (TC2) 
by Niklasson [11], the trace resetting asymmetric fourth 
order method (TRS4) by Niklasson et al. [12], the grand 
canonical scheme with fourth order projections (GC4) by 
Niklasson [11], the grand canonical McWeeny purification 
scheme (McW) [6,8], and finally the canonical scheme 
(PM) by Palser and Manolopolous [8] . The grand canon- 
ical schemes that require prior knowledge of the chemical 
potential are indicated in the figure by bold italics. 

For small band gaps, i.e. high values of TV, and with 
prior knowledge of /i, the asymmetric GC4 method is the 
most efficient technique. The best performing schemes 
that require no prior knowledge of /i are the TC2 and 
TRS4 schemes. The TC2 scheme is more memory effi- 
cient since it only needs to calculate a second order poly- 
nomial in each iteration and intermediate storage needed 
in higher order expansions is avoided. However, it can 
not deal with degeneracy and fractional occupancy, which 
are addressed with the TRS4 scheme [12]. At low occu- 
pation the PM scheme becomes very inefficient. This 
sensitivity is not seen for any of the other schemes. The 
proposed implicit purification scheme is slower than the 
alternative explicit purification schemes except for the 
PM scheme at low occupancies. However, it is the only 
method that, for only a slightly increased computational 
cost, correctly gives the temperature dependent Fermi- 
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Dirac distribution of the single-particle eigenstates. The 
implicit purification scheme scales with the logarithm of 
the expansion order. This is an important improvement 
over previous Fcrmi-Dirac operator expansion methods. 
Whereas in general, a polynomial can be calculated with 
a computational cost scaling at best with the square root 
of the order of the polynomial [26], we restrict the poly- 
nomial approximation to a nested form /(/(. . . f{x) ...)). 
In this case a high order can be reached much more ef- 
ficiently than for the general form. This is the key idea 
behind purification expansions. 

In summary, we have proposed an implicit purifica- 
tion scheme for the calculation of the temperature de- 
pendent single-particle density matrix given as a Fermi- 
Dirac operator expansion in terms of the Hamiltonian. 
The method is useful in connection with linear scaling 
electronic structure theory and it has a computational 
complexity that scales with the logarithm of the inverse 
temperature 0{\og(3) or as the logarithm of the polyno- 
mial expansion order C'(logn). 

Discussions with Matt Challacombe, Eric Chisolm, 
Siobhan Corish, S. Goedecker, C. J. Tymczak, and John 
Wills are gratefully acknowledged. 
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FIG. 1. The Fermi-Dirac distribution (dashed line) com- 
pared to the approximation 
1 — G5{x) (circles) in Eq. (10). The inset shows the error, 
Error x 10^ = (*FD(a;, 1/2, 20) - [1 - Gsix)]) x 10^. 

FIG. 2. Computational cost for various schemes at T « 0. 
Grand canonical schemes requiring prior knowledge of n are 
written with bold italics. The N eigenvalues are uniformly 
distributed in [0, 1] and the band gaps are therefore Ag = 1/N 
independent of the fractional occupation /occ = Ne/N. 
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